Word2vec Skip-Gram Dimensionality Selection via Sequential Normalized Maximum Likelihood

نویسندگان

چکیده

In this paper, we propose a novel information criteria-based approach to select the dimensionality of word2vec Skip-gram (SG). From perspective probability theory, SG is considered as an implicit distribution estimation under assumption that there exists true contextual among words. Therefore, apply criteria with aim selecting best so corresponding model can be close possible distribution. We examine following for selection problem: Akaike’s Information Criterion (AIC), Bayesian (BIC), and Sequential Normalized Maximum Likelihood (SNML) criterion. SNML total codelength required sequential encoding data sequence on basis minimum description length. The proposed applied both original Negative Sampling clarify idea using criteria. Additionally, suffers from computational disadvantages, introduce heuristics its efficient computation. Moreover, empirically demonstrate outperforms BIC AIC. comparison other evaluation methods word embedding, selected by significantly closer optimal obtained analogy or similarity tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model selection by normalized maximum likelihood

The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model...

متن کامل

Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation

The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problem...

متن کامل

Dynamic Word Embeddings via Skip-Gram Filtering

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec (Mikolov et al., 2013b). These embedding vectors are connected in time thro...

متن کامل

On Sequentially Normalized Maximum Likelihood Models

The important normalized maximum likelihood (NML) distribution is obtained via a normalization over all sequences of given length. It has two short-comings: the resulting model is usually not a random process, and in many cases, the normalizing integral or sum is hard to compute. In contrast, the recently proposed sequentially normalized maximum likelihood (SNML) models always comprise a random...

متن کامل

word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA

Mikolov et al. (2013) introduced the skip-gram formulation for neural word embeddings, wherein one tries to predict the context of a given word. Their negative-sampling algorithm improved the computational feasibility of training the embeddings. Due to their state-of-the-art performance on a number of tasks, there has been much research aimed at better understanding it. Goldberg and Levy (2014)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Entropy

سال: 2021

ISSN: ['1099-4300']

DOI: https://doi.org/10.3390/e23080997